Skip to content

fix(qemu): improve VM shutdown with graceful timeouts and PID safety#2479

Merged
smoser merged 1 commit intomainfrom
fix/less-errors-on-success
Apr 13, 2026
Merged

fix(qemu): improve VM shutdown with graceful timeouts and PID safety#2479
smoser merged 1 commit intomainfrom
fix/less-errors-on-success

Conversation

@smoser
Copy link
Copy Markdown
Contributor

@smoser smoser commented Apr 13, 2026

  • Suppress expected ExitMissingError logs when VM powers off abruptly during SSH shutdown
  • Add graceful multi-stage shutdown: wait 5s for process to exit, then SIGTERM + 5s, then SIGKILL
  • Store *os.Process instead of raw PID to eliminate accidental signal delivery to reused PIDs
  • Guarantee QEMU process exits cleanly before returning from TerminatePod

Fixes the race condition where libvterm builds would show spurious ERRO/WARN messages, while also making shutdown more robust and safe.

- Suppress expected ExitMissingError logs when VM powers off abruptly during SSH shutdown
- Add graceful multi-stage shutdown: wait 5s for process to exit, then SIGTERM + 5s, then SIGKILL
- Store *os.Process instead of raw PID to eliminate accidental signal delivery to reused PIDs
- Guarantee QEMU process exits cleanly before returning from TerminatePod

Fixes the race condition where libvterm builds would show spurious ERRO/WARN messages,
while also making shutdown more robust and safe.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@smoser smoser merged commit 2f32e77 into main Apr 13, 2026
64 checks passed
@smoser smoser deleted the fix/less-errors-on-success branch April 13, 2026 23:10
smoser added a commit to smoser/melange that referenced this pull request Apr 14, 2026
…lled

We should not train people or machines to ignore red ERROR messages.
With this change and chainguard-dev#2479, we have zero ERROR log entries in a
successful build.

Previously RetrieveObservabilityEvents always sent three `test -f`
SSH commands to probe for the observability events file, even when the
hook was never installed. Each probe exits non-zero (file not found),
causing sendSSHCommand to log ERROR three times for every build to the
console.

During CPIO generation, scan the base initramfs for the hook's
sentinel file (etc/tetragon/tetragon.tp.d/network-monitor.yaml) and
record the result in cfg.ObservabilityHook. This is accurate regardless
of how the package got into the image — QEMU_ADDITIONAL_PACKAGES,
QEMU_BASE_INITRAMFS, or any other mechanism. RetrieveObservabilityEvents
returns immediately when ObservabilityHook is false, and treats a
missing events file as an error when it is true.

We can now also correctly ERROR when there _was_ a observability hook
installed rather than just assuming it was not there.

Store the result of that scan in a sidecar (<cpio>.observability) so we
do not have to scan on cached initramfs.  The sidecar is invalidated
automatically when the CPIO is newer (fresh build,
QEMU_ADDITIONAL_PACKAGES change, etc.).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
smoser added a commit to smoser/melange that referenced this pull request Apr 14, 2026
…lled

We should not train people or machines to ignore red ERROR messages.
With this change and chainguard-dev#2479, we have zero ERROR log entries in a
successful build.

Previously RetrieveObservabilityEvents always sent three `test -f`
SSH commands to probe for the observability events file, even when the
hook was never installed. Each probe exits non-zero (file not found),
causing sendSSHCommand to log ERROR three times for every build to the
console.

During CPIO generation, scan the base initramfs for the hook's
sentinel file (etc/tetragon/tetragon.tp.d/network-monitor.yaml) and
record the result in cfg.ObservabilityHook. This is accurate regardless
of how the package got into the image — QEMU_ADDITIONAL_PACKAGES,
QEMU_BASE_INITRAMFS, or any other mechanism. RetrieveObservabilityEvents
returns immediately when ObservabilityHook is false, and treats a
missing events file as an error when it is true.

We can now also correctly ERROR when there _was_ a observability hook
installed rather than just assuming it was not there.

Store the result of that scan in a sidecar (<cpio>.observability) so we
do not have to scan on cached initramfs.  The sidecar is invalidated
automatically when the CPIO is newer (fresh build,
QEMU_ADDITIONAL_PACKAGES change, etc.).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
smoser added a commit to smoser/melange that referenced this pull request Apr 14, 2026
We should not train people or machines to ignore red ERROR messages.
With this change and chainguard-dev#2479, we have zero ERROR log entries in a
successful build.

Previously RetrieveObservabilityEvents always sent three `test -f`
SSH commands to probe for the observability events file, even when the
hook was never installed. Each probe exits non-zero (file not found),
causing sendSSHCommand to log ERROR three times for every build to the
console.

During CPIO generation, scan the base initramfs for the hook's
sentinel file (etc/tetragon/tetragon.tp.d/network-monitor.yaml) and
record the result in cfg.ObservabilityHook. This is accurate regardless
of how the package got into the image — QEMU_ADDITIONAL_PACKAGES,
QEMU_BASE_INITRAMFS, or any other mechanism. RetrieveObservabilityEvents
returns immediately when ObservabilityHook is false, and treats a
missing events file as an error when it is true.

We can now also correctly ERROR when there _was_ a observability hook
installed rather than just assuming it was not there.

Store the result of that scan in a sidecar (<cpio>.observability) so we
do not have to scan on cached initramfs.  The sidecar is invalidated
automatically when the CPIO is newer (fresh build,
QEMU_ADDITIONAL_PACKAGES change, etc.).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
smoser added a commit that referenced this pull request Apr 14, 2026
We should not train people or machines to ignore red ERROR messages.
With this change and #2479, we have zero ERROR log entries in a
successful build.

Previously RetrieveObservabilityEvents always sent three `test -f`
SSH commands to probe for the observability events file, even when the
hook was never installed. Each probe exits non-zero (file not found),
causing sendSSHCommand to log ERROR three times for every build to the
console.

During CPIO generation, scan the base initramfs for the hook's
sentinel file (etc/tetragon/tetragon.tp.d/network-monitor.yaml) and
record the result in cfg.ObservabilityHook. This is accurate regardless
of how the package got into the image — QEMU_ADDITIONAL_PACKAGES,
QEMU_BASE_INITRAMFS, or any other mechanism. RetrieveObservabilityEvents
returns immediately when ObservabilityHook is false, and treats a
missing events file as an error when it is true.

We can now also correctly ERROR when there _was_ a observability hook
installed rather than just assuming it was not there.

Store the result of that scan in a sidecar (<cpio>.observability) so we
do not have to scan on cached initramfs.  The sidecar is invalidated
automatically when the CPIO is newer (fresh build,
QEMU_ADDITIONAL_PACKAGES change, etc.).

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants